A VoiceFont Creation Framework for Generating Personalized Voices

نویسندگان

  • Takashi Saito
  • Masaharu Sakamoto
چکیده

This paper presents a new framework for effectively creating VoiceFonts for speech synthesis. A VoiceFont in this paper represents a voice inventory aimed at generating personalized voices. Creating wellformed voice inventories is a time-consuming and laborious task. This has become a critical issue for speech synthesis systems that make an attempt to synthesize many high quality voice personalities. The framework we propose here aims to drastically reduce the burden with a twofold approach. First, in order to substantially enhance the accuracy and robustness of automatic speech segmentation, we introduce a multi-layered speech segmentation algorithm with a new measure of segmental reliability. Secondly, to minimize the amount of human intervention in the process of VoiceFont creation, we provide easy-to-use functions in a data viewer and compiler to facilitate checking and validation of the automatically extracted data. We conducted experiments to investigate the accuracy of the automatic speech segmentation, and its robustness to speaker and style variations. The results of the experiments on six speech corpora with a fairly large variation of speaking styles show that the speech segmentation algorithm is quite accurate and robust in extracting segments of both phonemes and accentual phrases. In addition, to subjectively evaluate VoiceFonts created by using the framework, we conducted a listening test for speaker recognizability. The results show that the voice personalities of synthesized speech generated by the VoiceFont-based speech synthesizer are fairly close to those of the donor speakers. key words: personalized voice, voice font, voice inventory generation, automatic segmentation, corpus-based speech synthesis, speaker recognizability

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A method of creating a new speaker²s voicefont in a text-to-speech system

This paper presents a method of creating a new speaker’s voice database (VoiceFont) by which the voice of the donor speaker can be synthesized for mimicking in a text-to-speech system. A VoiceFont creation system, “VoiceFont Builder”, is developed to make the creation process easier and more effective than current systems. The voice feature extraction applied in the system is a simple but power...

متن کامل

Evaluation of speaker mimic technology for personalizing SGD voices

In this paper, we demonstrate the use of state-of-the-art speech technology to transform speech from a source speaker to mimic a particular target speaker with the intention of providng personalized voices to users of Speech Generating Devices (SGDs). This speaker mimicry (SM) capability allows us to use highquality acoustic inventories from professional speakers and transform them to a differe...

متن کامل

MIVOQ-PTTS - A Revolutionary New Way of Thinking TTS

MIVOQ-PTTS is a new TTS project whose goal is to offer innovative services suitable for creating and using personalized synthetic voices. Users can autonomously create their own synthetic voices by accessing a web interface and recording some sentences; the voice creation procedure do not require any other human intervention. In this work we will introduce MIVOQ-PTTS main ideas and we will illu...

متن کامل

Towards Personalised Synthesised Voices for Individuals with Vocal Disabilities: Voice Banking and Reconstruction

When individuals lose the ability to produce their own speech, due to degenerative diseases such as motor neurone disease (MND) or Parkinson’s, they lose not only a functional means of communication but also a display of their individual and group identity. In order to build personalized synthetic voices, attempts have been made to capture the voice before it is lost, using a process known as v...

متن کامل

Speaker recognizability evaluation of a voicefont-based text-to-speech system

We have developed a new text-to-speech system based on the VoiceFont technology. A VoiceFont is a voice dictionary for speech synthesis that holds the acoustic and prosodic characteristics extracted from the voice corpus of a speaker. The text-to-speech system using a VoiceFont is able to synthetically mimic the voice of the donor speaker. In this paper, we evaluated speaker recognizability of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IEICE Transactions

دوره 88-D  شماره 

صفحات  -

تاریخ انتشار 2005